L-sr1: a Second Order Optimization Method for Deep Learning

نویسندگان

  • Vivek Ramamurthy
  • Nigel Duffy
چکیده

We describe L-SR1, a new second order method to train deep neural networks. Second order methods hold great promise for distributed training of deep networks. Unfortunately, they have not proven practical. Two significant barriers to their success are inappropriate handling of saddle points, and poor conditioning of the Hessian. L-SR1 is a practical second order method that addresses these concerns. We provide experimental results showing that L-SR1 performs at least as well as Nesterov’s Accelerated Gradient Descent, on the MNIST and CIFAR10 datasets. For the CIFAR10 dataset, we see competitive performance on shallow networks like LeNet5, as well as on deeper networks like residual networks. Furthermore, we perform an experimental analysis of L-SR1 with respect to its hyperparameters to gain greater intuition. Finally, we outline the potential usefulness of L-SR1 in distributed training of deep neural networks.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Hybrid Optimization Algorithm for Learning Deep Models

Deep learning is one of the subsets of machine learning that is widely used in Artificial Intelligence (AI) field such as natural language processing and machine vision. The learning algorithms require optimization in multiple aspects. Generally, model-based inferences need to solve an optimized problem. In deep learning, the most important problem that can be solved by optimization is neural n...

متن کامل

A Hybrid Optimization Algorithm for Learning Deep Models

Deep learning is one of the subsets of machine learning that is widely used in Artificial Intelligence (AI) field such as natural language processing and machine vision. The learning algorithms require optimization in multiple aspects. Generally, model-based inferences need to solve an optimized problem. In deep learning, the most important problem that can be solved by optimization is neural n...

متن کامل

Optimization Methods for Supervised Machine Learning: From Linear Models to Deep Learning

The goal of this tutorial is to introduce key models, algorithms, and open questions related to the use of optimization methods for solving problems arising in machine learning. It is written with an INFORMS audience in mind, specifically those readers who are familiar with the basics of optimization algorithms, but less familiar with machine learning. We begin by deriving a formulation of a su...

متن کامل

A symmetric rank-one quasi-Newton line-search method using negative curvature directions

We propose a quasi-Newton line-search method that uses negative curvature directions for solving unconstrained optimization problems. In this method, the symmetric rank-one (SR1) rule is used to update the Hessian approximation. The SR1 update rule is known to have a good numerical performance; however, it does not guarantee positive definiteness of the updated matrix. We first discuss the deta...

متن کامل

A Discrete Hybrid Teaching-Learning-Based Optimization algorithm for optimization of space trusses

In this study, to enhance the optimization process, especially in the structural engineering field two well-known algorithms are merged together in order to achieve an improved hybrid algorithm. These two algorithms are Teaching-Learning Based Optimization (TLBO) and Harmony Search (HS) which have been used by most researchers in varied fields of science. The hybridized algorithm is called A Di...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017